spark-server: respect MiniMax auto tool choice#104
Conversation
|
@lesserevil heads-up: #108 (your bare-json auto fix) just landed, and it refactored the same resolver this PR rewrites, so they now collide in sampling_setup.rs. More than a textual conflict, the two disagree on intent:
I think your reasoning here in #104 is the more principled one (the trigger-token argument is convincing), but I do not want to merge a change to the per-parser required/auto semantics for two different models (Nemotron bare_json and MiniMax) by resolving the conflict myself and guessing. Could you rebase this onto current main and fold both behaviors into the one resolver, so bare_json -> required and minimax_xml -> auto is the explicit, single source of truth? The suppressed_for_turn / loop-suppression part is a clean addition and I want to keep it. Once it is rebased I will validate the tool-choice behavior live on MiniMax-M2.7-NVFP4 (cached here) before merging. The resolve_tool_mode abstraction with its unit table is the right shape, just needs to be the only resolver. |
|
@lesserevil closing this one for queue hygiene, not because the idea is wrong. #108 (your bare-json fix) landed and refactored the same resolver, so this now conflicts, and the two disagree on the per-parser semantics (this PR: bare_json required, minimax auto; #108 as merged: the opposite). Your trigger-token reasoning here is the more principled version, and the resolve_tool_mode struct + suppressed_for_turn loop-suppression is a clean addition I want. Please refile it rebased onto current main as the single source-of-truth resolver (bare_json -> required, minimax_xml -> auto, plus the loop-suppression), and I will validate the tool-choice behavior live on MiniMax-M2.7-NVFP4 (cached here) and merge it. Sorry to make you rebase, the timing of #108 landing first is on me. |
Summary
MiniMax chat requests now resolve an effective tool mode before sampling so
tool_choice: "auto"remains optional instead of forcing<tool_call>generation. Loop suppression also disables tool-call grammar for one recovery turn unless the caller explicitly requires a tool call.Test plan
cargo fmt --all -- --checkLIBRARY_PATH=/opt/vllm/nccl-blackwell/lib LD_LIBRARY_PATH=/opt/vllm/nccl-blackwell/lib ATLAS_SKIP_BUILD=1 cargo test -p spark-server sampling_tool_mode -- --nocaptureLIBRARY_PATH=/opt/vllm/nccl-blackwell/lib LD_LIBRARY_PATH=/opt/vllm/nccl-blackwell/lib ATLAS_SKIP_BUILD=1 CUDARC_CUDA_VERSION=13000 cargo clippy -p spark-server --tests -- -DwarningsATLAS_SKIP_BUILD=1 cargo clippy --workspace --tests --all-features -- -Dwarningscould not complete on this Linux host because the workspace hits the pre-existingobjc2Apple-platform compile gate before this change.bash scripts/check-license-headers.shcould not run becausescripts/check-license-headers.shis not present in this checkout.typoscould not run becausetyposis not installed on this host.nvidia/MiniMax-M2.7-NVFP4EP=2; Hermeshello!returned a normal assistant response with no repeated recall/tool loop, and a direct synthetic loop-suppression request returned assistant text while logging tool-call grammar suppression.Notes for reviewers
This keeps bare-JSON tool mode required, but MiniMax XML tools now follow normal OpenAI
autosemantics. Positive<tool_call>logit bias is only applied when a tool call is required, and loop suppression now hard-masks tool-call start tokens because explicit required/specific requests are filtered out before reaching the scheduler.Benchmarks: not run; this is a tool-choice correctness fix, not a performance-oriented change.
Authorship: AI-generated by Codex under human operator direction; no human-written code sections are claimed.
CLA